Goto

Collaborating Authors

 insider threat


Adapting Insider Risk mitigations for Agentic Misalignment: an empirical study

Gomez, Francesca

arXiv.org Artificial Intelligence

Agentic misalignment occurs when goal-directed agents take harmful actions, such as blackmail, rather than risk goal failure, and can be triggered by replacement threats, autonomy reduction, or goal conflict (Lynch et al., 2025). We adapt insider-risk control design (Critical Pathway; Situational Crime Prevention) to develop preventative operational controls that steer agents toward safe actions when facing stressors. Using the blackmail scenario from the original Anthropic study by Lynch et al. (2025), we evaluate mitigations across 10 LLMs and 66,600 samples. Our main finding is that an externally governed escalation channel, which guarantees a pause and independent review, reduces blackmail rates from a no-mitigation baseline of 38.73% to 1.21% (averaged across all models and conditions). Augmenting this channel with compliance email bulletins further lowers the blackmail rate to 0.85%. Overall, incorporating preventative operational controls strengthens defence-in-depth strategies for agentic AI. We also surface a failure mode diverging from Lynch et al. (2025): two models (Gemini 2.5 Pro, Grok-4) take harmful actions without goal conflict or imminent autonomy threat, leveraging sensitive information for coercive signalling. In counterfactual swaps, both continued using the affair regardless of whether the CEO or CTO was implicated. An escalation channel eliminated coercion, but Gemini 2.5 Pro (19 pp) and Grok-4 (7 pp) escalated more when the CTO was implicated, unlike most models (higher in the CEO condition). The reason for this divergent behaviour is not clear from raw outputs and could reflect benign differences in reasoning or strategic discrediting of a potential future threat, warranting further investigation.


AI-Driven IRM: Transforming insider risk management with adaptive scoring and LLM-based threat detection

Koli, Lokesh, Kalra, Shubham, Thakur, Rohan, Saifi, Anas, Singh, Karanpreet

arXiv.org Artificial Intelligence

Insider threats pose a significant challenge to organizational security, often evading traditional rule-based detection systems due to their subtlety and contextual nature. This paper presents an AI-powered Insider Risk Management (IRM) system that integrates behavioral analytics, dynamic risk scoring, and real-time policy enforcement to detect and mitigate insider threats with high accuracy and adaptability. We introduce a hybrid scoring mechanism - transitioning from the static PRISM model to an adaptive AI-based model utilizing an autoencoder neural network trained on expert-annotated user activity data. Through iterative feedback loops and continuous learning, the system reduces false positives by 59% and improves true positive detection rates by 30%, demonstrating substantial gains in detection precision. Additionally, the platform scales efficiently, processing up to 10 million log events daily with sub-300ms query latency, and supports automated enforcement actions for policy violations, reducing manual intervention. The IRM system's deployment resulted in a 47% reduction in incident response times, highlighting its operational impact. Future enhancements include integrating explainable AI, federated learning, graph-based anomaly detection, and alignment with Zero Trust principles to further elevate its adaptability, transparency, and compliance-readiness. This work establishes a scalable and proactive framework for mitigating emerging insider risks in both on-premises and hybrid environments.


Scalable and Ethical Insider Threat Detection through Data Synthesis and Analysis by LLMs

Gelman, Haywood, Hastings, John D.

arXiv.org Artificial Intelligence

Insider threats wield an outsized influence on organizations, disproportionate to their small numbers. This is due to the internal access insiders have to systems, information, and infrastructure. %One example of this influence is where anonymous respondents submit web-based job search site reviews, an insider threat risk to organizations. Signals for such risks may be found in anonymous submissions to public web-based job search site reviews. This research studies the potential for large language models (LLMs) to analyze and detect insider threat sentiment within job site reviews. Addressing ethical data collection concerns, this research utilizes synthetic data generation using LLMs alongside existing job review datasets. A comparative analysis of sentiment scores generated by LLMs is benchmarked against expert human scoring. Findings reveal that LLMs demonstrate alignment with human evaluations in most cases, thus effectively identifying nuanced indicators of threat sentiment. The performance is lower on human-generated data than synthetic data, suggesting areas for improvement in evaluating real-world data. Text diversity analysis found differences between human-generated and LLM-generated datasets, with synthetic data exhibiting somewhat lower diversity. Overall, the results demonstrate the applicability of LLMs to insider threat detection, and a scalable solution for insider sentiment testing by overcoming ethical and logistical barriers tied to data acquisition.


Toward an Insider Threat Education Platform: A Theoretical Literature Review

Gelman, Haywood, Hastings, John D., Kenley, David, Loiacono, Eleanor

arXiv.org Artificial Intelligence

Insider threats (InTs) within organizations are small in number but have a disproportionate ability to damage systems, information, and infrastructure. Existing InT research studies the problem from psychological, technical, and educational perspectives. Proposed theories include research on psychological indicators, machine learning, user behavioral log analysis, and educational methods to teach employees recognition and mitigation techniques. Because InTs are a human problem, training methods that address InT detection from a behavioral perspective are critical. While numerous technological and psychological theories exist on detection, prevention, and mitigation, few training methods prioritize psychological indicators. This literature review studied peer-reviewed, InT research organized by subtopic and extracted critical theories from psychological, technical, and educational disciplines. In doing so, this is the first study to comprehensively organize research across all three approaches in a manner which properly informs the development of an InT education platform.


Fine Grained Insider Risk Detection

Huber, Birkett, Neo, Casper, Sampson, Keiran, Kantchelian, Alex, Ksobiech, Brett, Pavlidis, Yanis

arXiv.org Artificial Intelligence

We present a method to detect departures from business-justified workflows among support agents. Our goal is to assist auditors in identifying agent actions that cannot be explained by the activity within their surrounding context, where normal activity patterns are established from historical data. We apply our method to help audit millions of actions of over three thousand support agents. We collect logs from the tools used by support agents and construct a bipartite graph of Actions and Entities representing all the actions of the agents, as well as background information about entities. From this graph, we sample subgraphs rooted on security-significant actions taken by the agents. Each subgraph captures the relevant context of the root action in terms of other actions, entities and their relationships. We then prioritize the rooted-subgraphs for auditor review using feed-forward and graph neural networks, as well as nearest neighbors techniques. To alleviate the issue of scarce labeling data, we use contrastive learning and domain-specific data augmentations. Expert auditors label the top ranked subgraphs as ``worth auditing" or ``not worth auditing" based on the company's business policies. This system finds subgraphs that are worth auditing with high enough precision to be used in production.


FedAT: Federated Adversarial Training for Distributed Insider Threat Detection

Gayathri, R G, Sajjanhar, Atul, Uddin, Md Palash, Xiang, Yong

arXiv.org Artificial Intelligence

Insider threats usually occur from within the workplace, where the attacker is an entity closely associated with the organization. The sequence of actions the entities take on the resources to which they have access rights allows us to identify the insiders. Insider Threat Detection (ITD) using Machine Learning (ML)-based approaches gained attention in the last few years. However, most techniques employed centralized ML methods to perform such an ITD. Organizations operating from multiple locations cannot contribute to the centralized models as the data is generated from various locations. In particular, the user behavior data, which is the primary source of ITD, cannot be shared among the locations due to privacy concerns. Additionally, the data distributed across various locations result in extreme class imbalance due to the rarity of attacks. Federated Learning (FL), a distributed data modeling paradigm, gained much interest recently. However, FL-enabled ITD is not yet explored, and it still needs research to study the significant issues of its implementation in practical settings. As such, our work investigates an FL-enabled multiclass ITD paradigm that considers non-Independent and Identically Distributed (non-IID) data distribution to detect insider threats from different locations (clients) of an organization. Specifically, we propose a Federated Adversarial Training (FedAT) approach using a generative model to alleviate the extreme data skewness arising from the non-IID data distribution among the clients. Besides, we propose to utilize a Self-normalized Neural Network-based Multi-Layer Perceptron (SNN-MLP) model to improve ITD. We perform comprehensive experiments and compare the results with the benchmarks to manifest the enhanced performance of the proposed FedATdriven ITD scheme.


ZETAR: Modeling and Computational Design of Strategic and Adaptive Compliance Policies

Huang, Linan, Zhu, Quanyan

arXiv.org Artificial Intelligence

Compliance management plays an important role in mitigating insider threats. Incentive design is a proactive and non-invasive approach to achieving compliance by aligning an insider's incentive with the defender's security objective, which motivates (rather than commands) an insider to act in the organization's interests. Controlling insiders' incentives for population-level compliance is challenging because they are neither precisely known nor directly controllable. To this end, we develop ZETAR, a zero-trust audit and recommendation framework, to provide a quantitative approach to model insiders' incentives and design customized recommendation policies to improve their compliance. We formulate primal and dual convex programs to compute the optimal bespoke recommendation policies. We create the theoretical underpinning for understanding trust, compliance, and satisfaction, which leads to scoring mechanisms of how compliant and persuadable an insider is. After classifying insiders as malicious, self-interested, or amenable based on their incentive misalignment levels with the defender, we establish bespoke information disclosure principles for these insiders of different incentive categories. We identify the policy separability principle and the set convexity, which enable finite-step algorithms to efficiently learn the Completely Trustworthy (CT) policy set when insiders' incentives are unknown. Finally, we present a case study to corroborate the design. Our results show that ZETAR can well adapt to insiders with different risk and compliance attitudes and significantly improve compliance. Moreover, trustworthy recommendations can provably promote cyber hygiene and insiders' satisfaction.


Deep Learning techniques for Cyber Security - DataScienceCentral.com

#artificialintelligence

For the first time, I taught an AI for Cyber Security course at the University of Oxford. I referred to this paper from Johns Hopkins which covered Deep Neural networks for Cyber Security (A Survey of Deep Learning Methods for Cyber Security) – references below where you can download the full paper for free. Detecting and Classifying Malware: The number and variety of malware attacks are continually increasing, making it more difficult to defend against them using standard methods. DL provides an opportunity to build generalizable models to detect and classify malware autonomously. There are a number of ways to detect malware.


5 AI and Cybersecurity Predictions for 2022

#artificialintelligence

While most approaches to cybersecurity remain stuck in the past -- using rules, signatures, and other historically defined understandings of threat -- best practice these days is to keep your focus forward, preparing for the unknown and the unpredictable. In that spirit, Darktrace anticipates what 2022 will bring, both in terms of the threat landscape and for evolutions in defensive technologies. Focusing on augmenting the human with AI is just as important as the cutting-edge mathematics that drive AI. The relationship between humans and AI can be improved with explainable artificial intelligence (XAI). In cybersecurity, this means delivering the insights of AI to the security team on a silver platter -- that is, in human-readable language and clear diagrams rather than abstruse code.


Insider Detection using Deep Autoencoder and Variational Autoencoder Neural Networks

Pantelidis, Efthimios, Bendiab, Gueltoum, Shiaeles, Stavros, Kolokotronis, Nicholas

arXiv.org Artificial Intelligence

Insider attacks are one of the most challenging cybersecurity issues for companies, businesses and critical infrastructures. Despite the implemented perimeter defences, the risk of this kind of attack is still very high. In fact, the detection of insider attacks is a very complicated security task and presents a serious challenge to the research community. In this paper, we aim to address this issue by using deep learning algorithms Autoencoder and Variational Autoencoder deep. We will especially investigate the usefulness of applying these algorithms to automatically defend against potential internal threats, without human intervention. The effectiveness of these two models is evaluated on the public dataset CERT dataset (CERT r4.2). This version of the CERT Insider Threat Test dataset includes both benign and malicious activities generated from 1000 simulated users. The comparison results with other models show that the Variational Autoencoder neural network provides the best overall performance with a greater detection accuracy and a reasonable false positive rate